Chunk-Level Reordering of Source Language Sentences with Automatically Learned Rules for Statistical Machine Translation
نویسندگان
چکیده
In this paper, we describe a sourceside reordering method based on syntactic chunks for phrase-based statistical machine translation. First, we shallow parse the source language sentences. Then, reordering rules are automatically learned from source-side chunks and word alignments. During translation, the rules are used to generate a reordering lattice for each sentence. Experimental results are reported for a Chinese-to-English task, showing an improvement of 0.5%–1.8% BLEU score absolute on various test sets and better computational efficiency than reordering during decoding. The experiments also show that the reordering at the chunk-level performs better than at the POS-level.
منابع مشابه
Syntax and Structure in Statistical Translation
In this paper, we describe a sourceside reordering method based on syntactic chunks for phrase-based statistical machine translation. First, we shallow parse the source language sentences. Then, reordering rules are automatically learned from source-side chunks and word alignments. During translation, the rules are used to generate a reordering lattice for each sentence. Experimental results ar...
متن کاملThe application of source language information in Chinese-English statistical machine translation
The quality of machine translation (MT) has been significantly improved by using statistical approaches. The integration of syntactic knowledge into a statistical MT system is still an open problem. This talk investigates the application of syntactic knowledge of the source language to the phrase-based MT system for translating Chinese into English. In this thesis, particular issues have been a...
متن کاملA Reordering Approach for Statistical Machine Translation
This paper presents a Markov based hierarchical reordering scheme for lexical reordering to incorporate into phrase-based statistical machine translation system. The goal is to reorder the words and phrases in source language syntactic structure into their corresponding target language syntactic order for making translation easy. Without reordering during language translation, sentences can onl...
متن کاملSyntax Based Reordering with Automatically Derived Rules for Improved Statistical Machine Translation
Syntax based reordering has been shown to be an effective way of handling word order differences between source and target languages in Statistical Machine Translation (SMT) systems. We present a simple, automatic method to learn rules that reorder source sentences to more closely match the target language word order using only a source side parse tree and automatically generated alignments. Th...
متن کاملSyntactic Preprocessing for Statistical Machine Translation
We describe an approach to automatic source-language syntactic preprocessing in the context of Arabic-English phrase-based machine translation. Source-language labeled dependencies, that are word aligned with target language words in a parallel corpus, are used to automatically extract syntactic reordering rules in the same spirit of Xia and McCord (2004) and Zhang et al. (2007). The extracted ...
متن کامل